Scene Graph Generation


Scene graph generation is the process of creating structured representations of scenes that capture the relationships between objects.

CausalNav: A Long-term Embodied Navigation System for Autonomous Mobile Robots in Dynamic Outdoor Scenarios

Add code
Jan 05, 2026
Viaarxiv icon

GenCAMO: Scene-Graph Contextual Decoupling for Environment-aware and Mask-free Camouflage Image-Dense Annotation Generation

Add code
Jan 03, 2026
Viaarxiv icon

Instance Communication System for Intelligent Connected Vehicles: Bridging the Gap from Semantic to Instance-Level Transmission

Add code
Dec 27, 2025
Viaarxiv icon

SparScene: Efficient Traffic Scene Representation via Sparse Graph Learning for Large-Scale Trajectory Generation

Add code
Dec 24, 2025
Viaarxiv icon

Object-Centric Framework for Video Moment Retrieval

Add code
Dec 20, 2025
Viaarxiv icon

LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents

Add code
Dec 19, 2025
Figure 1 for LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents
Figure 2 for LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents
Figure 3 for LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents
Figure 4 for LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents
Viaarxiv icon

Bridging Modalities and Transferring Knowledge: Enhanced Multimodal Understanding and Recognition

Add code
Dec 23, 2025
Viaarxiv icon

SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning

Add code
Dec 18, 2025
Figure 1 for SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning
Figure 2 for SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning
Figure 3 for SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning
Figure 4 for SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning
Viaarxiv icon

Seeing is Believing (and Predicting): Context-Aware Multi-Human Behavior Prediction with Vision Language Models

Add code
Dec 17, 2025
Viaarxiv icon

LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models

Add code
Dec 15, 2025
Figure 1 for LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models
Figure 2 for LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models
Figure 3 for LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models
Figure 4 for LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models
Viaarxiv icon